Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Appears in 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XII)

نویسندگان

  • Koushik Chakraborty
  • Philip M. Wells
  • Gurindar S. Sohi
چکیده

In canonical parallel processing, the operating system (OS) assigns a processing core to a single thread from a multithreaded server application. Since different threads from the same application often carry out similar computation, albeit at different times, we observe extensive code reuse among different processors, causing redundancy (e.g., in our server workloads, 45–65% of all instruction blocks are accessed by all processors). Moreover, largely independent fragments of computation compete for the same private resources causing destructive interference. Together, this redundancy and interference lead to poor utilization of private microarchitecture resources such as caches and branch predictors. We present Computation Spreading (CSP), which employs hardware migration to distribute a thread’s dissimilar fragments of computation across the multiple processing cores of a chip multiprocessor (CMP), while grouping similar computation fragments from different threads together. This paper focuses on a specific example of CSP for OS intensive server applications: separating application level (user) computation from the OS calls it makes. When performing CSP, each core becomes temporally specialized to execute certain computation fragments, and the same core is repeatedly used for such fragments. We examine two specific thread assignment policies for CSP, and show that these policies, across four server workloads, are able to reduce instruction misses in private L2 caches by 27–58%, private L2 load misses by 0–19%, and branch mispredictions by 9–25%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Program Chair’s Message

The broad goal of a conference is to move the field forward. I was fortunate that recent PC chairs of ASPLOS led very successful conferences, and I mostly followed in their footsteps. Nevertheless I identified two narrower goals for additional efforts: (1) to continue to enhance the image of ASPLOS as a broad, multidisciplinary conference, and (2) to continue to raise the bar for quality and fa...

متن کامل

Hardware Support for Synchronization in the Scalable Coherent Interface (SCI)

The exploitation of the inherent parallelism in applications depends critically on the eeciency of the synchronization and data exchange primitives provided by the hardware. This paper discusses and analyses such primitives as they are implemented in a pending IEEE standard 1596 for communication in a shared memory multiprocessor, the Scalable Coherent Interface (SCI). The SCI synchronization p...

متن کامل

Ninja: A Framework for Network Services

ion for controlling parallelism. DEC SRC Technical Report 42, Palo Alto, California, 1989. [SBL99] Y. Saito, B. Bershad and H. Levy. “Manageability, Availability and Performance in Porcupine: A Highly Scalable, Cluster-based Mail Service.” Proc. of the 17th SOSP. October 1999. [Sco96] S. L. Scott. “Synchronization and Communication in the T3E Multiprocessor.” Proc. of ASPLOS 1996,

متن کامل

Correction to RAMpage ASPLOS Paper

This paper contains corrections to published results on the RAMpage memory hierarchy. The originally published results contained erroneous values for cache miss penalties for a conventional cache architecture against which the RAMpage hierarchy was being compared. The incorrect results showed that RAMpage with context switches on misses to DRAM had similar performance to a conventional 2-way as...

متن کامل

On-line Data Compression in a Log-structured Le System. in Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (asplos Iv), 8 Future Work 5.3 Log Reclamation

swizzling at page fault time: EEciently and compatibly supporting huge addresses on standard hardware. A performance study of alternative object faulting and pointer swizzling strategies. 18 performance between normal (non-compressed) RAM and disk 26]. 24 We are developing adaptive compression techniques that exploit the typical low information content and word-wise alignment of heap data elds,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006